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Abstract 

We explore the use of higher-order tail area approximations for Bayesian simula- 
tion. These approximations give rise to an alternative simulation scheme to MCMC 
for Bayesian computation of marginal posterior distributions for a scalar parame- 
ter of interest, in the presence of nuisance parameters. Its advantage over MCMC 
methods is that samples are drawn independently with lower computational time 
and the implementation requires only standard maximum likelihood routines. The 
method is illustrated by a genetic linkage model, a normal regression with censored 
data and a logistic regression model. 

Keywords: Asymptotic expansions, Bayesian computation, Marginal posterior distribu- 
tion, Modified profile likelihood root, Nuisance parameters, Tail area approximation. 

1 Introduction 

Consider a parametric statistical model with density f(y;9), with 9 = (if), A), 9 G 9 C 
H d , where ip is a scalar parameter of interest and A is a (d — l)-dimensional nuisance 
parameter, and let L(9) = L(ip, A) = L(ip, A; y) denote the likelihood function based on 
data y. Let ir(9) = 7r(^, A) be a prior distribution for (-0, A) and let ir(9\y) = X\y) oc 
7r(0, A) L(ip,\) be the posterior distribution of (xjj, A). Bayesian inference on ij), in the 
presence of the nuisance parameter A, is based on the marginal posterior distribution 

7rft%) = / n(^\\y)d\, (1) 



which is typically computed numerically. Indeed, cumbersome numerical integration may 
be necessary in order to compute (CEJ), in particular when d is large. A variety of Markov 
chain Monte Carlo (MCMC) schemes have been proposed in the literature in order to 
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approximate (jTJ; see, e.g.. lRobert and Casellal ( 120041 ) . All these techniques use simulation 



to avoid tailoring analytical work to specific models. However, these methods may present 
some difficulties, especially when d is large, and may have poor tail behaviour. 

Parallel with these developments has been the development of analytical higher-order 



appro ximations for parametric inference in small sample (see, e.g.. iBrazzale and Davison 



20081 . and references therein). Using higher-order asymptotics it is possible to avoid the 



difficulties related to numer ical methods and obtain accurate app roximations of ( ED) an d 



related quantities (see, e.g., iReidl . Il996l . 120031 ; ISweetingi . Il996l . and lBrazzale et all 120071 ). 



These methods are highly accurate in ma ny situations, but are nevert heless underused 
compared to simulation-based procedures (IBrazzale and Davison! 120081). 



St arting from highe r-o rder tail area approxi mations for ([T]) (see iDiciccio and Martin 



19911 : IReidl . 119961 120031 and IBrazzale et all . 120071 ) , this paper describes the implementation 



and the use of a sampling scheme that give rise to very accurate computation of marginal 
posterior densities, and related quantities, such as posterior summaries. The implemen- 
tation of the proposed higher-order tail area approximation (HOTA) sampling scheme is 
available at little additional computation cost over simple first-order approximations, and 
it has the advantage over MCMC methods that samples are drawn independently in much 
lower computation time. This technique applies to r e gular models only; precise regularity 
conditions for their validity are given in iKass et al.l ( 119901 ) . 

The paper is organized as follows. Section [2] briefly reviews higher-order approxi- 
mations for the marginal posterior distribution (Q]), and for the corresponding tail area. 
Section 3 describes the proposed HOTA sampling scheme and its implementation. Nu- 
merical examples and applications are discussed in Section 4. Finally, some concluding 
remarks are given in Section 5. 



2 Background on higher-order asymptotics 



Let £ p (ip) = log L(i[), X^) be the profile loglikelihood for ip, where A^ is the constrained 
maximum likelihood estimate (MLE) of A for fixed tp. Moreover, let 9 = (ip, A) be the 
full MLE, and let j p (ip) = —d 2 £ p (ip)/dip 2 be the observed information corresponding 
to the profile loglikelihood. Standard results for partitioned matrices give = 
A^I/Ij'aa^, A^)|, where j(9) = is the observed Fisher information from 

£(i/},\) = logL(^),A) and jxx(ip,X) is the (A, A)-block of j(i[),X). The basic requirement 
for the approximations given in this section is that th ere exists an uniq ue MLE, as occurs 
in many commonly used parametric models; see also Kass et all ( 1990 ). 

The marginal posterior distribution (ED) c an be approximated with the Laplace formula 
(see, e.g., iTierney and Kadand . Il986l ; IReidl . Il996l ). We get 



\jxxW, A^)| 1/2 tt(^,A) 



(2) 



where c is the normalizing constant and the symbol "=" indicates that the approximation 
is accurate to third-order, that means with error of order O p (n~ 3 ^ 2 ). Note that the 
approximation (J2]) depends on simple likelihood quantities and the prior evaluated at 
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("0, A) and at (i[),X^). Then, starting from (j2j), an application of the tail area argument 
gives the corres ponding third-order approximat i on to the marginal posterior tail area 
probability (see iDiciccio and Martini . Il99ll ; iReidl . 120031 ). We have 



Tr(ip\y) dtp = $(r* B (iJ)o)) 



where $(■) is the standard normal distribution function, r p (i/j) 
l p (il)))] l l 2 is the profile likelihood root, 



(3) 



sigm>-^)[2(£,(V>)- 



(4) 



and 



7b WO = ?" P (V>) + 



log 



(5) 



Formula ([3]) gives an explicit expression for the posterior quantiles, and s(ip) = 1 
$(r^ (if))) gives the Bayesian survivor probability with third-order accuracy. 



When the particular class of matching priors is considered in (TTT) (see iTibshirani 
19891). the marginal posterio r distribution for tJj ca n be expressed as (jVentura et al.l . 120091 : 
Ventura and Racugnol . l201ll ; IVentura et al.l . 120121 ) 



ir(ip\y) oc L mp (ijj) n mp (ip) 



(6) 



where L mp (ip) = L p (ip)M(ip) is the modified profile likelihood for a suitably defined correc 



ed p 

tion term M(tp) (see, among others. ISeverinil . 120001 . Chap. 9 and lPace and Salvanl . 120061 ) 
and iTmpiip) oc ViM^-V) is the corresponding matching prior, with = 
A) - i^,x(i>, A)«aa(^, A) -1 ^^, A) the partial information, and A), A) 

and i\^{ip,\) blocks of the expected Fisher information i(ip,\) from £(if),\) 
A ccurate tail area probabilities are computable by direct integration of ([6]). In particular 



in 



Ventura and Racugnol ( 2011 ) it is shown that holds with 



r£M = r*M, (7) 

where r*(ip) is t he modified profile likelihood root of Barndorff- Nielsen and Chamber linl 
(119941 ); see also iBarndorff-Nielsen and Cox! (119941 ) and ISeverinil (120001 . Chap. 7). The 
quantity r*(ip) has the form (5) with 



<"ipip.\ 



1/2 



(8) 



3 Posterior simulation via tail area approximations 

Equation (3) can be used to calculate quantiles of the marginal posterior distribution of 
■0 but can not be used for computing moments, highest posterior density (HPD) regions, 
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or to plot its density function. In this section the higher-order tail area approxima- 
tion (HOTA) sampling scheme is presented whereby posterior summary via empirical 
moments, HPD, etc. is possible. Starting from ([3]), the simulation algorithm can be 
summarized as in Algorithm [TJ The quantities required for the implementation of the 
HOTA scheme are, apart from 9 and the corresponding loglikelihood value, the profile 
score £'(if)) the constrained MLE X^, and the curvature of the likelihood evaluated at the 
constrained MLE. 

If code for the full likelihood of the model is available, constrained maximization and 
computation of the hessian is generally straightforward. The profile likelihood may by ob- 
tained by considering a grid of values of if) and the related loglikelihood on the constrained 
estimate. After that, a spline interpolator may be applied, and the profile loglikelihood 
and its derivative can be easily evaluated. For many statistical models, available R func- 
tions can be used directly for computing the full and constrained maximization, and 
consequently for the related profile quantities. For instance, the command glm of R can 
handle many generalized linear models, and it offers the o ption offset for cons trained 



estimation. For more details on implementation issues see iBrazzale et al.l (120071 Chap. 
9). 

Algorithm 1 HOTA simulation from the marginal posterior distribution n(if)\y) 
1: for t = 1 T do 

2: generate a pseudo-random number z ~ N(0, 1) 
3: solve r B (if>) = z in if) 
4: end for 

5: store the vector if) as a sample from ir(if>\y). 

Typically, r B (if>) is a monotonically decreasing function in if) and has a numerical 
discontinuity at if). In order to implement Algorithm 1, the inverse of r B (if>) is required. 
This may be accomplished by first applying the function r B (if)) to a grid of if) values and, 
secondly, smoothing its values by a spline interpolator, e.g. with the smooth. spline 
command in R environement. The values of if) may be set equal to the limits of the Wald 
confidence interval if) ±4SE(if>), or its deviance version, where SE(ifj) = j p (if))~ 1/2 . If the 
prior information is rather strong, then they can be changed in a trial-and-error fashion by 
looking at the r B {ip) curve. The main point is that the range of values of r B (if>) should be 
wide enough to include all the most probable values of the standard normal distribution, 
e.g. -4 and 4. Lastly, in order to avoid numerical issues in the spline smoothing step, 
it may be necessary to exclude values of if> close to if). For instance, during the grid 
constructi on one may exc l ude y alues of if) in (ip — 5SE(if>), if> + 5SE(if>)), for some small 



6; see also IBrazzale et al.l (120071 . Chap. 9). 

The HOTA simulation procedure is essentially an inverse method of sampling and 
it gives independent samples from ([T|) by inverting the cumulative distribution function 
approximation (E]). In this respect, it has an obvious advantage over MCMC methods 
which usually requires more attention form the practitioner. Moreover, HOTA is almost 
automatically obtained from likelihood quantities, which opens the possibility of doing 
Bayesian inferen ce with maximum likeliho o d rou tines. We notice that a similar approach 



was suggested in iKharroubi and Sweetingl (120101 ). where the same type of approximation 
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methods were used but through the importance sampling technique. Also their approxi- 
mation methods seems to be quite accurate. Nevertheless, as far as the simulation from 
7r(if}\y) is concerned, we think HOTA has an obvious advantage over the latter in that it 
is easier to implement and computationally faster. 



4 Examples 

The aim of this section is to illustrate the performance and the advantage of the HOTA 
sampling scheme through three examples. In all but the first example, the HOTA scheme 
is compared with a typical posterior approximation technique, namely the Metropolis- 
Hastings (M-H), which is one of the MCMC methods most widely used in practice. 

MCMC methods give autocorrelated posterior samples and, before their use, it is 
important to check that the chain has converged to its ergodic distribution (see, e.g., 



Gelman et all 120031 ). For the two examples where M-H is used, the convergence to the 
posterior was carefully checked by the routines of coda packages of R. As already stressed, 
HOTA algorithm gives independent samples. Therefore, in order to fairly compare HOTA 
results with the MCMC method, we need the latter to have as low autocorrelation as 
possible. Hence, the chains were run for a very large number of iterations, previously 
thinned and the initial observations discarded. After all these steps we end up with 10 5 
simulations in each example, with no significant autocorrelation for time lags greater than 
10. 



4.1 Example 1: Genetic linkage model 



We st ar t with a simple scalar p arameter problem, which has been studied also in iRao 
(11973 ). iDempster et all (119771 ). iTanner and Wona (119871 ) and IKharroubi and Sweeting 



(120101 ). among others. It concerns a genetic linkage model in which n individuals are 



distributed multinomially into four categories with cell probabilities (| 



4> 4' 



o),l), with e 

expression Q 



G (0,1). In this situation there are no nuisance parameters. Therefore, 
simplifies to 



tt(%) d6 = $(r* B (6 )), 



where r* B {9) = r(9) + r(9)~ 1 \og{q B (9) /r(9)}, q B {9) = £'(0)j(0)~ 1/2 , ?(6) = d£(9)/d9, and 

r(9 ) = sim( 9 -9)\2 ( 1(9) - K9) )} 1 / 2 . 

Wei and Tanner dl990h and IKharroubi and Sweetingl d2010h apply this model to n — 
20 animals and cell counts y = (14,0,1,5). Under a uniform prior for 9, the posterior 
density of 9 is 

tt(%) oc (2 + 9f\l - 9)9\ 9 G (0, 1). 

Figure [Tj shows the posterior distribution computed with the HOTA algorithm (dashed) 
and the exact posterior distribution n(9\y) (solid line). For the computation with HOTA 
algorithm, a smoothing spline was run over a grid 50 values of 9 and a sample of T = 10 5 



5 



points were considered. In this simple example the sample, on an Intel i5 machine with 4 
GB RAM, was obtained in less than a second. 




Figure 1: Exact and HOTA posterior distribution in the genetic linkage model. 



The posterior distribution appears to be extremely skewed to the right (see lTanner and Wong 

19871 ). with a long left tail and in this case one might expect HOTA approximation to 
fail. On the contrary, HOTA algorithm gets very close to the true posterior, albeit the 
sample size is n = 20. 

In order to further explore the behavior of the HOTA algorithm, the two posteriors 
are compared also in terms of some summary statistics (mean, standard deviation, 2.5 
percentile, median, 97.5 percentile and 0.95 HPD credible set) in Table 1. The HOTA 



Posterior 


Mean 


St. Dev. 


Qo.025 


Median 


Qo.975 


0.95 HPD 


Exact 
HOTA 


0.831 
0.827 


0.108 
0.109 


0.57 
0.563 


0.852 
0.848 


0.978 
0.976 


(0.62, 0.994) 
(0.617, 0.994) 



Table 1: Numerical summaries of the exact and HOTA posterior distribution in the genetic 
linkage model. 



results are very close to those based on the true posterior. 



4.2 Example 2: Censored regression 



The data are taken fromlSchmee and Hahnl (119791) and were anal yzed from a Bayesian per- 
specti ve in lNaylor and Smith! ( 119821 ) , IWei and Tanner! (119901 ) and lKharroubi and Sweeting 
(120101 ) . among others. The data consists on temperature accelerated life tests on electrical 
insulation in n = 40 motorettes. Ten motorettes were tested at each of four temperatures 
in degrees Centigrade (150°, 17°, 190° and 220°), the test termination (censoring) time 
being different at each temperature. 
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As in 



Schmee and Hahn 



(119791 ). we consider the following model 



y% = Po + PiZi + ae h i = 1, . . . , n, 

where jji is the log 10 (failuretime) with time in hours, Xi = 1000/(temperature + 273.2) 
and 6i are independent standard normal errors. Reordering the data so that the first m 
observations are uncensored, with observed log- failure times yt, and the remaining n — m 
are censored at times itj, the loglikelihood function for 9 = (/3 , 0i, cr) can be written as 

i(e) = -mloga-—J2(yi-Po-/3iXi) 2 + ^ log 1 1 - $ 

i=l i=m+l ^ 

In this example, for the parameter vector (/3 , Pi, t), with r = log a, the non-informative 
prior 7r(/3 , j3±, r) oc 1 is assumed. Clearly, the posterior distribution 7r(/3 , r\y) does not 
have a closed form expression and direct integration is not possible in order to compute 
n{ip\y) and related quantities, where ip is one of the parameters of the model. Therefore 
numerical or analytical approximations are needed. 



MCMC 

- - - - HOTA 



U; 



00 - f3tXi 



a 




-1.0 



-2.0 



-1.5 



-0.5 



Figure 2: HOTA and MCMC marginal posterior distributions of r in the censored regres- 
sion. 

The marginal posterior densities n(f3o\y), 7i((3i\y) and Ti(r\y) were computed both with 
the HOTA algorithm and, for comparison purposes, with the MCMC algorithm. For the 
computation with HOTA, grids of 50 points were chosen for all the parameters and the 
total number of simulations was T = 10 5 . The computational time was less than 2 seconds 
whereas the computation with MCMC took about 95 seconds. 

As an example, Figure [2] illustrates the HOTA and MCMC marginal posterior dis- 
tributions of r. Clearly, the distributions have a moderate positive asymmetry and the 
HOTA sampling scheme and MCMC give very similar results. 

In Table 2 the two methods are compared also with respect to some summary statistics 
calculated over the three marginal posterior distributions. The results based of the two 
methods are in good agreement, and this holds also for the 0.95 HPD credible sets. 
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Method 


Par am. 


Mean 


St Dev. 


<5o.025 


Median 


Qo.975 


0.95 HPD 


MCMC 


T 


-1.24 


0.201 


-1.60 


-1.253 


-0.811 


(-1.616, -0.832) 


HOTA 


-1.24 


0.202 


-1.601 


-1.251 


-0.808 


(-1.624, -0.837) 


MCMC 


A) 


-6.204 


1.117 


-8.57 


-6.139 


-4.149 


( -8.413, -4.01) 


HOTA 


-6.191 


1.128 


-8.596 


-6.134 


-4.13 


(-8.475, -4.038) 


MCMC 


ft 


4.409 


0.518 


3.461 


4.382 


5.512 


(3.425, 5.47) 


HOTA 


4.401 


0.521 


3.459 


4.37 


5.521 


(3.398, 5.443) 



Table 2: Numerical summaries of the MCMC and HOTA marginal posterior distributions 
in the censored regression. 



4.3 Example 3: Logistic regression 

In this exa mple we consider a log i stic r egression model applied to the urine dataset 



reported in Andrews and Herzbergi (119851 ); see also iBrazzale et al.l (120071 . Chap. 4). This 
dataset concerns calcium oxalate crystals in samples of urine. The binary response y is an 
indicator of the presence of such crystals, and there are six explanatory variables: specific 
gravity (gravity), i.e. the density of urine relative to water; pH (ph); osmolarity (osmo, 
mOsm); conductivity (conduct, mMho); urea concentration (urea, millimoles per litre); 
and calcium concentration (calc, millimoles per litre). After dropping the two incomplete 
cases, the dataset consists of 77 observations. 

Let us denote by X the (n x 7) design matrix composed by a vector of ones and the 
six covariates, and let (3 = (/3 , • • • , fis) be the vector of parameters, with f3 Q being the 
intercept. The loglikelihood function for /3 is 

n 

m = y T X(3 - log{l + exp{xf /?}}, 

i=l 

where Xi represents the the i-th row of X and y is the vector of binary responses. For 
illustrative purposes let us consider the non-informative prior tt(/3) oc 1, which leads to a 
posterior distribution being proportional to the likelihood. 

From a computational point of view, as far as HOTA is concerned, a grid of 50 points 
were used and a sample of T = 10 5 values were simulated from each marginal posterior 
in about 5 seconds. Whereas, the computation time with MCMC was about 100 seconds. 
The comparison of the HOTA algorithm with MCMC is illustrated for the marginal 
posterior of (3 6 in Figure EJ The posterior distribution of /3 6 appears to be slightly skewed. 
Again HOTA and MCMC produce very similar results. 

Let us further consider the posterior distributions of the last three parameters, namely 
/?4, /?5 and fis, over which we calculate the summary statistics considered also in the previ- 
ous examples. These parameters represent the effects of conductivity, ure a concentration 



and ca lcium concentration respectively. From a frequentist point of view, IBrazzale et al. 



(120071 ) found that the two tailed p-values associated to these parameters were respectively 
0.08493, 0.04703 and 0.00121 and, under frequentist higher order refinements, only /3 6 was 
significantly different from zero at 5% significance level. Bayesian analysis, instead, sug- 
gests that with 0.95 probability the first two variables have a negative effect whereas 
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0.0 



0.5 



1.0 



1.5 



2.0 



Figure 3: HOTA and MCMC marginal posterior distributions of /3 e in the logistic regres- 
sion. 



the last one has a positive effect (see the 0.95 HPDs in Table 3). Moreover, with 0.99 
probability only the HPD for calcium concentration does not include zero. 



Method 


Param. 


Mean 


St Dev. 


Qo.025 


Median 


Qo.975 


0.95 HPD 


MCMC 
HOTA 




-0.536 
-0.546 


0.277 
0.281 


-1.108 
-1.128 


-0.525 
-0.535 


-0.025 
-0.026 


(-1.084, -0.005) 
(-1.108, -0.01) 


MCMC 
HOTA 




-0.039 
-0.039 


0.018 
0.018 


-0.076 
-0.077 


-0.038 
-0.039 


-0.006 
-0.006 


(-0.075, -0.005) 
(-0.076, -0.005) 


MCMC 
HOTA 


Pb 


0.935 
0.926 


0.268 
0.267 


0.471 
0.466 


0.914 
0.904 


1.524 
1.509 


(0.43, 1.464) 
(0.429, 1.459) 



Table 3: Numerical summaries of the MCMC and HOTA marginal posterior distributions 
for /3 4 , (3 5 and (3 G in the logistic regression. 



Another advantage of the HOTA simulation scheme is that sensitivity analyses with 
respect to the prior can be done very quickly. In fact, for a different prior distribution, 
say tti(9), the values of 9 and 9^ remain the same and only the prior ratio in equations @ 
and ([5]) has to be recomputed. Moreover, using the same set of normal random numbers 
one can compare the prior's effect on the posterior distribution under the same Monte 
Carlo variation. 

As an illustrative example we compare the marginal posterior distribution for /3 6 under 
the matching prior 7T mp (ijj) and the multivariate normal prior with independent compo- 
nents. For the logistic regression model and in the case of the matching prior (see Sec. 
[2]), equation ((H]) becomes 

' | Jaa( ^ A) |i/ 2 
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Whereas the multivariate normal prior, very commonly used in practice, can be written 
as 7r(/3) ~ Nd(/i ,S ). For illustration purposes we take /io = Od where 0^ is a (1 x d) 
vector of zeros and So = kid-, with Id the (d x d) identity matrix. In our case d = 7 and 
we consider three possible alternatives for k, namely 10, 35 and 100. Figure |4] illustrates 



Flat 

Normal k=1 




Figure 4: HOTA marginal posteriors of fi§ with the flat (solid line), normal k — 10 
(dashed), normal k = 35 (dotted), normal k = 100 (dash-dotted) and the matching (long 
dashed) prior. 

the marginal posterior distributions for /3 6 with the four different priors along with the 
flat prior, all computed both with HOTA. Lastly, recall that the flat prior can also be 
seen as a normal prior with zero mean and infinite variance, e.g. k = oo. As expected, 
the posterior distribution with lower k tends to be more concentrated and closer to zero, 
whereas as k increases the influence of the prior diminishes and the posterior is dominated 
by the data. In fact for k = 100 and k = oo, e.g. the flat prior, the two posteriors are 
almost indistinguishable. Interestingly, also the matching prior appears to be quite far 
from the non-informative flat prior and closer to the normal prior with k = 10. 

The comparison between MCMC and HOTA marginal posterior distributions for /3 6 
is illustrated in Figure |5j Since the matching prior is defined for a scalar parameter 
only, e.g. vr mp (/3 6 ), this advantage in terms of elicitation becomes a problem for MCMC. 
Therefore in this case we avoid the comparison of MCMC with HOTA. We notice a slight 
discrepancy in the agreement between HOTA and MCMC approximation as k gets lower. 
We will discuss more on this point in SectionSJ 

As a numeric check, Table 4 gives some summary statistics of the marginal posterior 
distributions computed with HOTA and MCMC for the four alternative prior distribu- 
tions, namely the matching prior and the multivariate normal prior with different values 
of k. We note that the marginal posterior for (3 5 seems quite robust with respect to the 
prior specification and HOTA and MCMC are in a stunning agreement. Whereas for 
/?4 and there are some minor differences. For instance, with the matching prior the 
0.95 HPD of 04 includes the zero value, whereas the 0.95 HPDs based on the other two 
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Normal k=10 



Normal k=35 




Figure 5: HOTA and MCMC marginal posteriors of /3q with normal k = 10, normal 
k = 35, normal k = 100 and the flat prior. 



marginal posteriors obtained with the same prior do not include the zero value. The 
agreement between HOTA and MCMC remains still at an acceptable level. 



5 Concluding remarks 

By considering higher-order tail area approximations of a marginal posterior distribution 
for a scalar parameter of interest, we illustrate the HOTA algorithm for simulating in- 
dependent samples from the posterior distribution. The HOTA algorithm is essentially 
based on standard likelihood quantities, and it opens the possibility of doing Bayesian in- 
ference with usual maximum likelihood routines' output and prior distributions. In cases 
where it can be applied, the HOTA scheme has the advantage over MCMC methods that 
it does not need convergence check and simulations are obtained almost in an automatic 
way, without having to rely on proposal distributions (like in M-H), importance function 
(importance sampling), full conditional posterior distribution (Gibbs sampling) or latent 
structure representation of the posterior (data augmentation). 

The HOTA sampling scheme may be a very us eful tool in the co ntext of Bayesian ro- 



bustness with respect to the prior distribution (see lKass et all Il989l ). Moreover, with the 
HOTA algorithm prior's effect on the posterior distribution can be appreciated with the 
Monte Carlo variation being the same by just using the same set of normal random num- 
bers. Sometimes, repeated sampling properties of the posterior distribution are needed, 
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Prior 


Method 


Param. 


Mean 


St Dev. 


Qo.025 


Median 


Qo.975 


0.95 HPD 


Matching 


HOTA 




-0.508 


0.273 


-1.08 


-0.496 


-0.007 


(-1.048, 0.02) 


N, k = 10 


MCMC 




-0.59 


0.257 


-1.123 


-0.579 


-0.117 


(-1.092, -0.093) 


N, = 10 


HOTA 




-0.601 


0.243 


-1.095 


-0.593 


-0.145 


(-1.094, -0.144) 


N, jfe = 35 


MCMC 


Pa 


-0.555 


0.266 


-1.107 


-0.545 


-0.061 


(-1.088, -0.046) 


N, k = 35 


HOTA 




-0.564 


0.27 


-1.123 


-0.554 


-0.062 


(-1.106, -0.049) 


N, jfe = 100 


MCMC 




-0.544 


0.273 


-1.107 


-0.532 


-0.039 


(-1.088, -0.026) 


N, jfe = 100 


HOTA 




-0.554 


0.276 


-1.127 


-0.543 


-0.046 


(-1.107, -0.029) 


Matching 


HOTA 




-0.037 


0.018 


-0.074 


-0.037 


-0.005 


(-0.072, -0.003) 


N, jfe = 10 


MCMC 




-0.04 


0.017 


-0.075 


-0.04 


-0.009 


(-0.074, -0.008) 


N, jfe = 10 


HOTA 




-0.041 


0.016 


-0.074 


-0.04 


-0.011 


(-0.073 -0.01) 


N, jfe = 35 


MCMC 


fa 


-0.04 


0.017 


-0.075 


-0.039 


-0.008 


(-0.074, -0.007) 


N, jfe = 35 


HOTA 




-0.04 


0.017 


-0.076 


-0.039 


-0.007 


(-0.075, -0.007) 


N, jfe = 100 


MCMC 




-0.039 


0.018 


-0.075 


-0.039 


-0.007 


(-0.075, -0.006) 


N, jfe = 100 


HOTA 




-0.04 


0.018 


-0.077 


-0.039 


-0.007 


(-0.075, -0.006) 


Matching 


HOTA 




0.862 


0.257 


0.419 


0.841 


1.422 


(0.375, 1.365) 


N, jfe= 10 


MCMC 




0.884 


0.25 


0.451 


0.866 


1.426 


(0.433, 1.401) 


N, jfe = 10 


HOTA 




0.873 


0.239 


0.454 


0.858 


1.384 


(0.43, 1.353) 


N, jfe = 35 


MCMC 


fit 


0.909 


0.259 


0.458 


0.889 


1.468 


(0.427, 1.425) 


N, jfe = 35 


HOTA 




0.909 


0.258 


0.462 


0.889 


1.468 


(0.44, 1.436) 


N, jfe = 100 


MCMC 




0.924 


0.264 


0.468 


0.902 


1.496 


(0.435, 1.45) 


N, jfe = 100 


HOTA 




0.922 


0.263 


0.468 


0.901 


1.493 


(0.435, 1.445) 



Table 4: Numerical summaries of the marginal posteriors of $4, (3$ and /3g with the normal 
prior (N) and the matching prior based on MCMC and the HOTA approximation. 



and also in this respect HOTA may be very useful because of its reduced computation 
time. Finally, in the examples considered in this article, the algorithm appears to be 
accurate also for small sample sizes. 

As underlined in Section 14.31 informative priors may imply a slight deterioration in 
HOTA's accuracy. This is not surprising. Indeed, HOTA is essentially based on the MLE 
and if the prior information is large, then the MLE and the posterior mode could be 
substantially different and obviously approximating the posterior distribution around the 
MLE could be misleading. An instan ce of this situation happens if one uses the Zellner's 



G-prior (see iMarin and Robertl . 120071 . p. 107) in the logistic regression of Section 14.31 In 



fact, in this case the prior distribution puts a considerable mass on zero and tends to be 
flat elsewhere. Therefore, the posterior mode and MLE are remarkably different and, the 
HOTA's accuracy slightly deteriorates. Clearly, this is due also to the moderate sample 
size. 

The problem of ac curacy when p osterior mode and MLE are different is already known 



in the literature (see lDavisonll2003l Chap. 11) and we stress that it is a minor issue for 



at least three reasons. First, one can incorporate the prior in the likeliho od function and 



consider the Laplace expansion on the posterior mode instead of the MLE. iDavisonl (120031 . 
p. 600) gives the expression of the tail area probability in this case and he claims that 
this approximation tends to be more accurate than the usual one. Second, priors such as 
the G-prior were essentially suggested as non-informative priors for model choice, which 
is a topic we are not addressing here. Nevertheless, in Bayesian estimation the use of 
flat priors or proper priors, perhaps with large variance, is a more usual practice; see for 
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instance iGelman et al.l ( 120031 ). Lastly, since the HOTA simulation scheme is based on a 
third-order tail area approximation we expect its accuracy to increase with sample size. 

The proposed procedure can be applied only for marginal posterior distributions for 
scalar components. How to extend it to multidimensional marginal posterior distributions 
is an open question. This is not a trivial problem since HOTA is based on the higher-order 
approximation (3) which, at present, does not exist for dimension larger than one. 
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